Research Paper: A Globally Optimal k-Anonymity Method for the De-Identification of Health Data
نویسندگان
چکیده
BACKGROUND Explicit patient consent requirements in privacy laws can have a negative impact on health research, leading to selection bias and reduced recruitment. Often legislative requirements to obtain consent are waived if the information collected or disclosed is de-identified. OBJECTIVE The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. DESIGN Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for different values of k and suppression limits. Measurement Three information loss metrics were used for the comparison: precision, discernability metric, and non-uniform entropy. Each algorithm's performance speed was also evaluated. RESULTS The Datafly and Samarati algorithms had higher information loss than OLA and Incognito; OLA was consistently faster than Incognito in finding the globally optimal de-identification solution. CONCLUSIONS For the de-identification of health datasets, OLA is an improvement on existing k-anonymity algorithms in terms of information loss and performance.
منابع مشابه
A Globally Optimal k-Anonymity Method for the De-Identification of Health Data
Objective: The authors developed and empirically evaluated a new globally optimal de-identification algorithm that satisfies the k-anonymity criterion and that is suitable for health datasets. Design: Authors compared OLA (Optimal Lattice Anonymization) empirically to three existing k-anonymity algorithms, Datafly, Samarati, and Incognito, on six public, hospital, and registry datasets for diff...
متن کاملImproved Univariate Microaggregation for Integer Values
Privacy issues during data publishing is an increasing concern of involved entities. The problem is addressed in the field of statistical disclosure control with the aim of producing protected datasets that are also useful for interested end users such as government agencies and research communities. The problem of producing useful protected datasets is addressed in multiple computational priva...
متن کاملA novel local search method for microaggregation
In this paper, we propose an effective microaggregation algorithm to produce a more useful protected data for publishing. Microaggregation is mapped to a clustering problem with known minimum and maximum group size constraints. In this scheme, the goal is to cluster n records into groups of at least k and at most 2k_1 records, such that the sum of the within-group squ...
متن کامل(α, k)-Anonymity: An Enhanced k-Anonymity Model for Privacy-Preserving Data Publishing
Privacy preservation is an important issue in the release of data for mining purposes. The k-anonymity model has been introduced for protecting individual identification. Recent studies show that a more sophisticated model is necessary to protect the association of individuals to sensitive information. In this paper, we propose an (α, k)-anonymity model to protect both identifications and relat...
متن کاملAchieving k-Anonymity by Clustering in Attribute Hierarchical Structures
Individual privacy will be at risk if a published data set is not properly de-identified. k-anonymity is a major technique to de-identify a data set. A more general view of k-anonymity is clustering with a constraint of the minimum number of objects in every cluster. Most existing approaches to achieving k-anonymity by clustering are for numerical (or ordinal) attributes. In this paper, we stud...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of the American Medical Informatics Association : JAMIA
دوره 16 5 شماره
صفحات -
تاریخ انتشار 2009